Geometric Perspectives of the BM25

نویسنده

  • Giorgio Maria Di Nunzio
چکیده

In this paper, we present the initial findings about a possible geometric interpretation of the BM25 model and a comparison of the BM25 with the Binary Independence Model (BIM) on a two-dimensional space. A Web application was developed in R to show an example of this geometric view on a standard TREC collection. The application is accessible at the following link: http://gmdn.shinyapps.io/shinyRF04

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Log-Logistic Model-Based Interpretation of TF Normalization of BM25

The effectiveness of BM25 retrieval function is mainly due to its sub-linear term frequency (TF) normalization component, which is controlled by a parameter k1. Although BM25 was derived based on the classic probabilistic retrieval model, it has been so far unclear how to interpret its parameter k1 probabilistically, making it hard to optimize the setting of this parameter. In this paper, we pr...

متن کامل

Score Distributions in Information Retrieval

We review the history of modeling score distributions, focusing on the mixture of normal-exponential by investigating the theoretical as well as the empirical evidence supporting its use. We discuss previously suggested conditions which valid binary mixture models should satisfy, such as the Recall-Fallout Convexity Hypothesis, and formulate two new hypotheses considering the component distribu...

متن کامل

Integrating the Probabilistic Models BM25/BM25F into Lucene

This document describes the BM25 and BM25F implementation using the Lucene Java Framework. The implementation described here can be downloaded from [Pérez-Iglesias 08a]. Both models have stood out at TREC by their performance and are considered as stateof-the-art in the IR community. BM25 is applied to retrieval on plain text documents, that is for documents that do not contain fields, while BM...

متن کامل

Microblog Processing: A Study

Sensing Microblog from retrieval and summarization become the challenging area for the Information retrieval community. Twitter is one of the most popular micro blogging platforms. In this paper, Twitter posts called tweets are studied from retrieval and extractive summarization perspectives. Given a set of topics or interest profiles or information requirement, a Microblog summarization system...

متن کامل

Improving the Sentiment Analysis Process of Spanish Tweets with BM25

The enormous growth of user-generated information of social networks has caused the need for new algorithms and methods for their classification. The Sentiment Analysis (SA) methods attempt to identify the polarity of a text, using among other resources, the ranking algorithms. One of the most popular ranking algorithms is the Okapi BM25 ranking, designed to rank documents according to their re...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015